

# INTERNATIONAL JOURNAL OF ENGINEERING SCIENCES & RESEARCH TECHNOLOGY

# Power Optimization in Domino Circuits using Stacked Transistors

P.Karthikeyan<sup>z</sup>, N.Saravanan<sup>2</sup>

<sup>\*1</sup>Assistant Professor, ECE, PSNA College of Engineering &Technology, Dindigul, Tamilnadu, India <sup>2</sup> PG Scholar PSNACET, Dindgul, India

karthickcnp@gmail.com

### Abstract

In this work low leakage and high noise immunity domino circuit is analysed. Usually power and noise immunity are optimized at the expense of reduced speed. The domino circuit described has negligible speed degradation. The circuit improves the noise immunity by comparing the pull up network current with the worst case leakage current. The logic implementation network is separated from the keeper transistor by current comparison stage in which the current of the pull up network is compared against the worst case leakage current. The contention between the keeper transistor and pull down network is greatly reduced by this method. The dynamic node is isolated from logic implementation network and hence the parasitic capacitance on the dynamic node is greatly reduced. Since capacitance is reduced the loss in speed due to additional transistors is compensated. Because of reduced parasitic capacitance small keepers are enough to design faster circuits. A footer transistor is employed in diode configuration which further reduces leakage current.

Keywords :Domino Circuits.

#### Introduction

Dynamic logic such as domino logic is widely used in many applications to achieve high performance, which cannot be achieved with static logic styles. However, the main drawback of dynamic logic families is that they are more sensitive to noise than static logic families. On the other hand, as the technology scales down, the supply voltage is reduced for low power, and the threshold voltage(Vth) is also scaled down to achieve high performance. Since reducing the threshold voltage exponentially increases the subthreshold leakage current, reduction of leakage current and improving noise immunity are of major concern in robust and high-performance designs in recent technology generations. However, in wide fan-in dynamic gates, especially for wide fan-in OR gates, robustness and performance significantly degrade with increasing leakage current. As a result, it is difficult to obtain satisfactory robustness-performance tradeoffs.

In this paper, a new current-comparisonbased domino (CCD) circuit for wide fan-in applications in ultradeep submicrometer technologies is proposed. The novelty of the proposed circuit is that our work simultaneously increases performance and decreases leakage power consumption. The rest of this paper is arranged as follows. After the literature review in Section II, the proposed circuit is described in Section III. Section IV includes simulation results for the proposed circuit using T-SPICE simulations in the 16-nm V2.1 highperformance predictive technology compared with other conventional circuits. Section V concludes the results.

## **Literature Review**

The most popular dynamic logic is the conventional standard domino circuit as shown in Fig. 1. In this design,a PMOS keeper transistor is employed to prevent any undesired discharging at the dynamic node due to the leakage currents and charge sharing of the pull-down network (PDN) during the evaluation phase, hence improving the robustness.



Traditional keeper approach is less effective in new generations of CMOS technology. Although keeper upsizing improves noise immunity, it increases current contention between the keeper

http://www.ijesrt.com(C)International Journal of Engineering Sciences & Research Technology

transistor and the evaluation network. Thus, it increases power consumption and evaluation delay of standard domino circuits. These problems are more critical in wide fan-in dynamic gates due to the large number of leaky NMOS transistors connected to the dynamic node. Hence, there is a tradeoff between robustness and performance, and the number of pulldown legs is limited. The existing techniques try to compromise one feature to gain at the expense of the other. Several circuit techniques are proposed in the literature to address these issues. These circuit techniques can be divided into two categories. In the first category, circuit techniques change the controlling circuit of the gate voltage of the keeper such as conditional-keeper domino (CKD) [5], highspeed domino (HSD) [6], leakage current replica (LCR) keeper domino [7], and controlled keeper by current-comparison domino (CKCCD) [8]. On the other hand, in the second category, designs including the proposed designs change the circuit topology of the footer transistor or reengineer the evaluation network such as diodefooted domino (DFD) [4] and diode-partitioned domino (DPD)[9].

### **Proposed CCD Design**

Since in wide fan-in gates, the capacitance of the dynamic node is large, speed is decreased dramatically. In addition, noise immunity of the gate is reduced due to many parallel leaky paths in wide gates. Although upsizing the keeper transistor can improve noise robustness, power consumption and delay are increased due to large contention. These problems would be solved if the PDN implements logical function, is separated from the keeper transistor by using a comparison stage in which the current of the pull-up network (PUN) is compared with the worst case leakage current. This idea is conceptually illustrated in Fig. 3(a), which utilizes the PUN instead of the PDN. In fact, there is a race between the PUN and the reference current. Transistor MK is added in series with the reference current to reduce power consumption when the voltage of the output node has fallen to ground voltage



# ISSN: 2277-9655 Impact Factor: 1.852

An important issue in the generation of the reference voltage, which is the correct variation of the reference current according to the process variations to maintain the robustness of the proposed circuit. Process variations are due to random and systematic parameter fluctuations [10]. In random variations, parameters of each device vary individually and independent of adjacent devices. However, systematic variations affect the parameters of neighborhood transistors in the same way, yielding a strong correlation between parameters of nearby devices [11]. In this paper, systematic variations are considered. We have assumed that in a given circuit design the threshold voltage of all nMOS transistors varies together and that of pMOS transistors varies together. In the proposed circuit, effects of any threshold voltage variation on the voltage of nodes A and B [in Fig. 3(b)] is important because it directly affects the speed of the gate, and consequently power consumption and noise immunity. The worst scenario is that the threshold voltage of nMOS transistors is decreased and that of the pMOS transistors is increased, i.e., fast nMOS and slow pMOS due to process variations. In the former case, the subthreshold leakage of pMOS transistors of the PUN is decreased, thus the reference current must be reduced and vice versa for the latter case. Therefore, the reference current must be varied according to threshold voltage variations to maintain robustness in this design. To track process variations in dynamic logic circuits, several solutions are proposed in the literature by using a process variation sensor [12], such as one based on drain-induced barrier lowering (DIBL) effect [11], rate sensing keeper [13], and replica keeper current [7].

In the proposed circuit, a replica circuit like that proposed by[7] can be used as a leakage current sensor for proper operation and superior performance, in the worst case of fan-in, i.e., a 64input OR gate because of its maximum leakage current among other gates. The proposed circuit for generation of reference current for all gates is shown in Fig. 3(b). This circuit is similar to a replica leakage circuit proposed by [7], in which a series diodeconnection transistor M6 similar to M1 is added. In fact, as shown in Fig. 3(b), this circuit was a replica of the worst case leakage current of the PUN to correctly track leakage current variations due to process variations. Therefore, the gate of transistor M7 is connected to VDD, and its size is derived from the sizes of pMOS transistors of the PUN in the worst case, i.e., a 64-input OR gate, and hence its width is set equal to the sum of the widths of 64 pMOS transistors of the PUN. In the proposed CCD circuit, as shown in Fig. 3(b), current of the PUN is mirrored by transistor M2 and compared with the reference

http://www.ijesrt.com(C)International Journal of Engineering Sciences & Research Technology

current, which replicates the leakage current of the PUN. The topology of the keeper transistors and the reference circuit, which is shared for all gates, is similar to that proposed in [7], which successfully tracked the process, voltage and temperature variations.

The proposed circuit employs pMOS transistors to implement logical function, as shown in Fig. 3(b). Using the N-well process, source and body terminals of the pMOS transistors can be connected together such that the body effect is eliminated. By this means, the threshold voltage of transistors is only varied due to the process variatio and not the body effect. Moreover, utilizing pMOS transistors instead of nMOS ones in the N-well process, it is possible to prevent increasing the threshold voltage due to the body effect in existence of a voltage drop due to the diode configuration of transistor *M*1, yielding decreasing the delay.

In other words, one can use nMOS transistors in the P-well process to achieve a higher speed due to their higher mobility. Although slower mobility of pMOS transistors decreases the speed, decreasing the capacitance of the dynamic node in the proposed circuit enables it to increase speed by proper choice of the mirror ratio M [see (2)]. As shown in Fig. 3(b), the proposed circuit has five additional transistors and a shared reference circuit compared to standard footless domino (SFLD). The proposed circuit can be considered as two stages. The first stage preevaluation network includes the PUN and transistors MPre, MEval, and M1. The PUN, which implements the desired logic function is disconnected from dynamic node Dyn, unlike traditional dynamic logic circuits, and indirectly changes the dynamic voltage. The second stage looks like a footless domino with one input [node A as input in Fig. 3(b)], without any charge sharing, one transistor M2 regardless of the implemented Boolean function in the PUN, and a controlled keeper consists of two transistors. Only one pull-up transistor is connected to the dynamic node instead of the ntransistor in the *n*-bit OR gate to reduce capacitance on the dynamic node, yielding a higher speed. The input signal of the second stage is prepared by the first stage. In the evaluation phase, thus, the dynamic power consumption consists of two parts: one part for the first stage and the other for the second stage. As we know the dynamic power consumption directly depends on the capacitance, voltage swing, and contention current on the switching node in the constant condition for frequency, power supply, and temperature. The first stage with *n*-input has a lower voltage swing VDD to VTHP and no contention. On the other hand, the second stage has rail-to-rail voltage swing with minimum contention. Although the proposed circuit has some area overhead, it has less dynamic power consumption compared to footless domino.

Transistor M1 is configured in diode connection, i.e., its gate and drain terminal are connected together. In the evaluationmode, the current of the PUN transistors establishes some voltage drop across M1. This voltage will be low, if all inputs are at the high level and only leakage current exists in the PUN and mirror transistor M2. Otherwise, if at least one conductive path exists between node A and ground, for example, level of one input becomes low in the OR gate, this voltage drop is raised up, turning on mirror transistor M2 and changing the output voltage.

The voltage drop across transistor M1 causes the gate-source voltage of the off transistors in the PUN to become positive, yielding an exponential reduction in subthreshold leakage due to the phenomenon called the stacking effect [14]. It should be noted that if the body effect is not eliminated due to the unequal voltage of the source and body terminals, the leakage current will be decreased further at the expense of higher deviation due to process variations.

The voltage across the diode footer in other domino circuits that use diode-footed techniques such as [4] and [8] must be decreased to zero in order to lower the dynamic node voltage to zero. But in the proposed circuit, it is not necessary for this voltage to reach 0 V since the current of the diode footer is needed instead of the voltage across it. Therefore, the size of the diode-footer transistor M1 in the proposed circuit is smaller than other DFD circuits. Consequently, a lower leakage current must be compensated by the keeper transistors instead of the larger one in the other circuit due to the larger size of the footer and mirror transistors. This results in lower delay and power consumption and area overhead. On the other hand, in the next predischarge mode, the dynamic node is charged from nonzero voltage to power supply voltage, yielding reduction in the power consumption with respect to existence of the large capacitance on the dynamic node in wide fan-in gates, especially wide fan-in OR gates. In addition, since transistor M1 increases the switching threshold voltage of the pMOS transistors, the new switching threshold voltage of the gate is about twice the threshold voltage of the pMOS devices [4].

With reference to the circuit schematic shown in Fig. 3(b), two phases of the proposed circuit are explained in detail as follows.

A. Predischarge Phase

Input signals and clock voltage are in high and low levels, respectively, [CLK = "0", CLK = "1"] in Fig. 3(b)] in this phase. Therefore, the voltages of

http://www.ijesrt.com(C)International Journal of Engineering Sciences & Research Technology

the dynamic node (Dyn) and node A have fallen to the low level by transistor MDis and raised to the high level by transistor Mpre, respectively. Hence, transistors Mpre, MDis, Mk1, and Mk2 are on and transistors M1, M2, and MEval are off. Also, the output voltage is raised to the high level by the output inverter.

B. Evaluation Phase

In this phase, clock voltage is in the high level [CLK = "1", CLK = "0" in Fig. 3(b)] and input signals can be in the low level. Hence, transistors Mpre and MDis are off, transistor M1, M2, Mk2, and MEval are on, and transistor Mk1 can become on or off depending on input voltages. Thus, two states may occur. First, all of the input signals remain high. Second, at least one input falls to the low level. In the first state, a small amount of voltage is established across transistor M1 due to the leakage current. Although this leakage current is mirrored by transistor M2, the keeper transistors of the second stage (Mk1 and Mk2) compensate this mirrored leakage current. It is clear that upsizing the transistor M1 and increasing the mirror ratio (M) increase the speed due to higher mirrored current at the expense of noise-immunity degradation. In the second state, when at least one conduction path exists, the pull-up current flow is raised and the voltage of node A is decreased to nonzero voltage, which is equal to gatesource voltage of the saturated transistor M1. This voltage is also equal to drain-source voltage of M1 and depends on size of M1 and its current. Increasing the pull-up current increases the mirrored current in transistor M2, thus voltage of the dynamic node Dyn is charged to VDD, yielding discharging the voltage of the output node and turning off the main keeper transistor Mk1. By this technique the contention current between the keeper transistor and the mirror transistor is mitigated. Fig. 4 shows the simulated waveforms of the proposed circuit for the 8-input OR gate. The waveforms are obtained by T-SPICE simulator in the 16-nm high-performance V2.1 predictive technology models (PTMs) [3] at 110 °C and 0.8 V supply voltage. In this simulation, only one input of an OR gate with 8 inputs falls to the low level in the evaluation phase. The simulation is performed by setting Wp/Wn = 2 for the output inverter, CL = 5 fF, and minimum size for the other transistors.

### **Simulation Results**

The following circuits are simulated using Tanner Spice. The input/output waveforms and instantaneous powers are calculated. The average power is computed as the mean of the instantaneous power.





**Fig. 4(a)** 



Power consumption of various circuits obtained by simulation are compared against SFLD in Table 1 below

| 64 F            | Normalized<br>Power<br>Power<br>Normalized<br>Power | 1<br>19.4<br>1                             | 1.24<br>21.5<br>1.11                        | 1.1<br>18.5<br>1                | 0.8<br>15.2<br>0.8                       | 0.5<br>9.1<br>0.5                          |
|-----------------|-----------------------------------------------------|--------------------------------------------|---------------------------------------------|---------------------------------|------------------------------------------|--------------------------------------------|
| 64 F            | Normalized<br>Power<br>Power                        | 1<br>19.4                                  | 1.24<br>21.5                                | 1.1<br>18.5                     | 0.8<br>15.2                              | 0.5<br>9.1                                 |
| ۱<br>۱          | Normalized<br>Power                                 | 1                                          | 1.24                                        | 1.1                             | 0.8                                      | 0.5                                        |
|                 |                                                     |                                            |                                             |                                 |                                          |                                            |
| 32 <sup>F</sup> | Power                                               | 17                                         | 21                                          | 18.3                            | 14                                       | 8.4                                        |
| ۱<br>۲          | Normalized<br>Power                                 | 1                                          | 1.33                                        | 1.2                             | 0.8                                      | 0.7                                        |
| 16 <sup>F</sup> | Power                                               | 12.3                                       | 16.4                                        | 14.4                            | 10.1                                     | 8                                          |
| ۲<br>۱          | Normalized<br>Power                                 | 1                                          | 1.45                                        | 1.5                             | 0.7                                      | 0.8                                        |
| 8 <sup>F</sup>  | Power                                               | 9.7                                        | 14.1                                        | 14.4                            | 7.2                                      | 7.8                                        |
| Fan-In          |                                                     | Standard<br>footless<br>domino(SFLD)<br>uW | Conditional-<br>keeper<br>domino(CKD)<br>uW | High-speed<br>domino(HSD)<br>uW | Leakage<br>Current<br>replica(LCR)<br>uW | Current<br>comparison<br>domino(CCD)<br>uW |

#### Conclusion

From the power comparison table it is seen that the proposed CCD consumes less power http://www.ijesrt.com(C)International Journal of Engineering Sciences & Research Technology compared with other standard domino circuits. The noise immunity is also equally good and has very less speed degradation.

#### **References**

- [1] J. M. Rabaey, A. Chandrakasan, and B. Nicolic, Digital Integrated Circuits: A Design Perspective, 2nd ed. Upper Saddle River, NJ: Prentice-Hall, 2003.
- [2] L. Wang, R. Krishnamurthy, K. Soumyanath, and N. Shanbhag, "An energy-efficient leakage-tolerant dynamic circuit technique," in Proc. Int. ASIC/SoC Conf., 2000, pp. 221–225.
- [3] Predictive Technology Model (PTM). 16 nm High Performance V2.1 Technology of PTM Model. (2012, Feb. 19) [Online]. Available: http://www.eas.asu.edu/~ptm/
- [4] H. Mahmoodi and K. Roy, "Diode-footed domino: A leakage-tolerant high fan-in dynamic circuit design style," IEEE Trans. Circuits Syst. I, Reg. Papers, vol. 51, no. 3, pp. 495–503, Mar. 2004.
- [5] A. Alvandpour, R. Krishnamurthy, K. Sourrty, and S. Y. Borkar, "A sub-130-nm conditional-keeper technique," IEEE J. Solid-State Circuits, vol. 37, no. 5, pp. 633– 638, May 2002.
- [6] M. H. Anis, M. W. Allam, and M. I. Elmasry, "Energy-efficient noise-tolerant dynamic styles for scaled-down CMOS and MTCMOS technologies," IEEE Trans. Very Large Scale (VLSI) Syst., vol. 10, no. 2, pp. 71–78, Apr. 2002.
- [7] Y. Lih, N. Tzartzanis, and W. W. Walker, "A leakage current replica keeper for dynamic circuits," IEEE J. Solid-State Circuits, vol. 42, no. 1, pp. 48–55, Jan. 2007.
- [8] A. Peiravi and M. Asyaei, "Robust low leakage controlled keeper by currentcomparison domino for wide fan-in gates, integration," VLSI J., vol. 45, no. 1, pp. 22– 32, 2012.
- [9] H. Suzuki, C. H. Kim, and K. Roy, "Fast tag comparator using diode partitioned domino for 64-bit microprocessors," IEEE Trans. Circuits Syst., vol. 54, no. 2, pp. 322–328, Feb. 2007.
- [10]K. Bowman, S. G. Duval, and J. D. Meindl, "Impact of die-to-die and within-die parameter fluctuations on the maximum clock frequency distribution for gigascale integration," IEEE J. Solid State Circuits, vol. 37, no. 2, pp. 183–190, Feb. 2002.

- [11]H. F. Dadgour and K. Banerjee, "A novel variation-tolerant keeper architecture for high-performance low-power wide fan-in dynamic or gates," IEEE Trans. Very Large Scale (VLSI) Syst., vol. 18, no. 11, pp. 1567– 1577, Nov. 2010.
- [12]C. H. Kim, K. Roy, S. Hsu, R. Krishnamurthy, and S. Borkar, "A process variation compensating technique with an on-die leakage current sensor for nanometer scale dynamic circuits," IEEE Trans. Very Large Scale (VLSI) Syst., vol. 14, no. 6, pp. 646–649, Jun. 2006.
- [13]R. G. David Jeyasingh, N. Bhat, and B. Amrutur, "Adaptive keeper design for dynamic logic circuits using rate sensing technique," IEEE Trans. Very Large Scale (VLSI) Syst., vol. 19, no. 2, pp. 295–304, Feb. 2011.
- [14]K. Roy, S. Mukhopadhyay, and H. Mahmoodi-Meimand, "Leakage current mechanisms and leakage reduction techniques in deepsubmicrometer CMOS circuits," Proc. IEEE, vol. 91, no. 2, pp. 305–327, Feb. 2003.
- [15]N. Shanbhag, K. Soumyanath, and S. Martin, "Reliable low-power design in the presence of deep submicron noise," in Proc. ISLPED, 2000, pp. 295–302.
- [16]M. Alioto, G. Palumbo, and M. Pennisi, "Understanding the effect of process variations on the delay of static and domino logic," IEEE Trans. Very Large Scale (VLSI) Syst., vol. 18, no. 5, pp. 697–710, May 2010.
- [17]H. Mostafa, M. Anis, and M. Elmasry, "Novel timing yield improvement circuits for high-performance low-power wide fan-in dynamic OR gates," IEEE Trans. Circuits Syst. I, Reg. Papers, vol. 58, no. 10, pp. 1785–1797, Aug. 2011.

http://www.ijesrt.com(C)International Journal of Engineering Sciences & Research Technology [842-846]